Speech recognition overview

Speech recognition (SR) is the ability of the operating system to convert spoken words to written text. An internal driver, called an SR engine, recognizes words and converts them to text. The SR engine may be installed with the operating system or at a later time with other software. During the installation process, speech-enabled packages such as word processors and web browsers, may install their own engines or they can use existing ones. Additional engines are also available through third party manufacturers. These engines often use a certain jargon or vocabulary; for example, a vocabulary specializing in medical or legal terminology. They can also use different voices allowing for regional accents such as British English, or speak a different language altogether such as German, French or Russian.

You need a microphone or some other sound input device to receive the sound. In general, the microphone should be a high quality device with noise filters built in. The speech recognition rate is directly related to the quality of the input. The recognition rate will be significantly lower or perhaps even unacceptable with a poor microphone. The Microsoft Speech Recognition Training Wizard (Voice Training Wizard) guides you through the process and recommends the best position to place the microphone allowing you to test it for optimal results.

Once you have installed the system and it is working, it is important to train it for your environment and speaking style. On the Speech Recognition tab, click Train Profile and use the Voice Training Wizard to train the system to recognize background noises such as a fan, the hum of air conditioning, or other office sounds. It adapts to your speaking style including accents, pronunciations and even idiomatic phrases.

Speech Recognition Tips

Speech recognition is not designed for completely hands-free operation; you'll get the best results if you use a combination of your voice and the mouse or keyboard. Also a consistent quality of speech results in the best results. When speaking to others, we usually understand from the context and environment even when whispered, shouted, or talking quickly or slowly. However, speech recognition understands words best when spoken to in a more predictable manner.